A Hybrid Text Classification System Using Sentential Frequent Itemsets
نویسندگان
چکیده
Text classification techniques mostly rely on single term analysis of the document data set, while more concepts especially the specific ones are usually conveyed by set of terms. To achieve more accurate text classifier, more informative feature including frequent co-occurring words in the same sentence and their weights are particularly important in such scenarios. In this paper, we propose a novel approach using sentential frequent itemset, a concept comes from association rule mining, for text classification, which views a sentence rather than a document as a transaction, and uses a variable precision rough set based method to evaluate each sentential frequent itemset’s contribution to the classification. Experiments over the Reuters corpus are carried out, which validate the practicability of the proposed system. Key-Words: text classification, sentential frequent itemsets, variable precision rough set model.
منابع مشابه
Performance Evaluation of an Efficient Frequent Item sets-Based Text Clustering Approach
The vast amount of textual information available in electronic form is growing at a staggering rate in recent times. The task of mining useful or interesting frequent itemsets (words/terms) from very large text databases that are formed as a result of the increasing number of textual data still seems to be a quite challenging task. A great deal of attention in research community has been receiv...
متن کاملHybrid Approach for Punjabi Text Clustering
Text Clustering is a text mining technique which is used to group similar documents into single cluster by using some sort of similarity measure and placing dissimilar documents into different clusters. Most of the popular clustering algorithms treats document as conglomeration of words and do not consider the syntactic or semantic relations between words. To overcome this drawback, some algori...
متن کاملText clustering using frequent itemsets
Frequent itemset originates from association rule mining. Recently, it has been applied in text mining such as document categorization, clustering, etc. In this paper, we conduct a study on text clustering using frequent itemsets. The main contribution of this paper is three manifolds. First, we present a review on existing methods of document clustering using frequent patterns. Second, a new m...
متن کاملAn Efficient Approach for Text Clustering Based on Frequent Itemsets
In recent times, the vast amount of textual information available in electronic form is growing at staggering rate. This increasing number of textual data has led to the task of mining useful or interesting frequent itemsets (words/terms) from very large text databases and still it seems to be quite challenging. The use of such frequent itemsets for text clustering has received a great deal of ...
متن کاملUsing attribute value lattice to find closed frequent itemsets
Finding all closed frequent itemsets is a key step of association rule mining since the non-redundant association rule can be inferred from all the closed frequent itemsets. In this paper we present a new method for finding closed frequent itemsets based on attribute value lattice. In the new method, we argue that vertical data representation and attribute value lattice can find all closed freq...
متن کامل